
United States Patent and Trademark Office 



UNITED STATES DEPARTMENT OF COMMERCE 
United States Patent and Trademark Office 

Address: COMMISSIONER FOR PATENTS 
P.O. Box 1450 

Alexandria, Virginia 22313-1450 
www.uspto.gov 



APPLICATION NO. 


FILING DATE 


FIRST NAMED INVENTOR 


ATTORNEY DOCKET NO. 


CONFIRMATION NO. 


10/782,955 


02/23/2004 


Chih-Chung Kuo 


KUOC3019/EM 


7575 



23364 7590 03/06/2008 

BACON & THOMAS, PLLC 
625 SLATERS LANE 
FOURTH FLOOR 
ALEXANDRIA, VA 223 14 



EXAMINER 



GODBOLD, DOUGLAS 



ART UNIT 



2626 



PAPER NUMBER 



MAIL DATE 



DELIVERY MODE 



03/06/2008 PAPER . 

Please find below and/or attached an Office communication concerning this application or proceeding. 

The time period for reply, if any, is set in the attached communication. 



PTOL-90A (Rev. 04/07) 



Office Action Summary 


Application No. 

10/782,955 


Applicant(s) 

KUO ET AL 


Examiner 

Douglas C. Godbold 


Art Unit 

2626 





The MAILING DATE of this communication appears on the cover sheet with the correspondence address 
Period for Reply 



A SHORTENED STATUTORY PERIOD FOR REPLY IS SET TO EXPIRE 3 MONTH(S) OR THIRTY (30) DAYS, 
WHICHEVER IS LONGER, FROM THE MAILING DATE OF THIS COMMUNICATION. 

- Extensions of time may be available under the provisions of 37 CFR 1 .136(a). In no event, however, may a reply be timely filed 
after SIX (6) MONTHS from the mailing date of this communication. 

- If NO period for reply is specified above, the maximum statutory period will apply and will expire SIX (6) MONTHS from the mailing date of this communication. 

- Failure to reply within the set or extended period for reply will, by statute, cause the application to become ABANDONED (35 U.S.C. § 133). 
Any reply received by the Office later than three months after t he mailing date of this communication, even if timely filed, may reduce any 
earned patent term adjustment. See 37 CFR 1.704(b). 

Status 

1)S Responsive to communication(s) filed on 11 December 2007 , 
2a)E3 This action is FINAL.. 2b)D This action is non-final. 

3) D Since this application is in condition for allowance except for formal matters, prosecution as to the merits is 

closed in accordance with the practice under Ex parte Quayle, 1935 CD. 11, 453 O.G. 213. 

Disposition of Claims 

4) S Claim(s) 1-18 is/are pending in the application. 

4a) Of the above claim(s) is/are withdrawn from consideration. 

5) EED Claim(s) 1^9 is/are allowed. 

6) E3 Claim(s) 10-16 is/are rejected. 

7) S Claim(s) 17 and 18 is/are objected to. 

8) D Claim(s) are subject to restriction and/or election requirement. 

Application Papers 

9) D The specification is objected to by the Examiner. 

10) D The drawing(s) filed on is/are: a)D accepted or b)D objected to by the Examiner. 

Applicant may not request that any objection to the drawing(s) be held in abeyance. See 37 CFR 1 .85(a). 
Replacement drawing sheet(s) including the correction is required if the drawing(s) is objected to. See 37 CFR 1.121(d). 

11) D The oath or declaration is objected to by the Examiner. Note the attached Office Action or form PTO-152. 

Priority under 35 U.S.C. § 119 

12) D Acknowledgment is made of a claim for foreign priority under 35 U.S.C. § 119(a)-(d) or (f). 
a)D All b)D Some * c)D None of: 

1 .□ Certified copies of the priority documents have been received. 

2. Q Certified copies of the priority documents have been received in Application No. . 

3. D Copies of the certified copies of the priority documents have been received in this National Stage 

application from the International Bureau (PCT Rule 17.2(a)). 
* See the attached detailed Office action for a list of the certified copies not received. 



Attachment(s) 

1) □ Notice of References Cited (PTO-892) 4) □ Interview Summary (PTO-413) 

2) □ Notice of Draftsperson's Patent Drawing Review (PTO-948) Pa Pe r No(s)/Mail Date. . 

3) □ Information Disclosure Statement(s) (PTO/SB/08) 5 ) □ Notice of Informal Patent Application 

Paper No(s)/Mail Date . 6) □ Other: . 

U. S. Patent and Trademark Office 

PTOL-326 (Rev. 08-06) Office Action Summary Part of Paper NoVMail Date 200801 31 



Application/Control Number: Page 2 

10/782,955 

Art Unit: 2626 

DETAILED ACTION 

1 . This office action is in response to correspondence filed 1 1 December 2007 in 
reference to application 10/782,955. Claims 1-18 are pending in the application and 
have been examined. 

Response to Amendment 

2. The amendments filed 1 1 December 2007 have been accepted and considered 
in this office action. Claims 1 and 18 have been amended. 

Response to Arguments 

3. Applicant's arguments, see Remarks pages 8-12, filed 1 1 December 2007, with 
respect to claims 1-9 have been fully considered and are persuasive. The Rejections 
under 35 U.S.C 101, 102, and 103 of claims 1-9 has been withdrawn. The Rejections 
under 35 U.S.C 102of claims 10-16 and 18 has also been withdrawn. 

4. Applicant's arguments filed 1 1 December 2007, with respect to claims 10-16 
have been fully considered but they are not persuasive. Claim 10, although it does 
contain a segment measure verification step and a phonetic verification, it is not the 
limitations that are argued on page 10 of the arguments. Claims 10 does not teach a 
"segment confidence measure verification of all cutting points of N test speech unit 
segments to determine if the cutting points of the test speech unit segments are 
correct." Claim 10 only recites a segmental verifier for verifying the correctness of the 
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cutting points of test speech unit segments by obtaining a segmental confidence 
measure. It is not require by claim 10 that all speech segments cutting points are 
verified. Therefore the rejection of claim 10 and dependent claims under 35 U.S. C 103 
was proper. 

Claim Rejections - 35 USC § 103 

5. The text of those sections of Title 35, U.S. Code not included in this action can 
be found in a prior Office action. 

6. Claims 10-12, and 14-16 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Chou et al. (Corpus-Based Mandarin Speech Synthesis with 
Contextual Syllabic Units Based on Phonetic Properties) in view of Modi et al. (US 
Patent 6,125,345). 

7. Consider claim 10, Chou teaches an automatic speech segmentation and 
verification system (Figure 1, automatic segmentation.) comprising: 

a database for storing a recorded speech corpus, the recorded speech corpus 
corresponding to a known text script, the known text script defining phonetic information 
with N phonetic units (With an accompanying orthographic transcription, the corpus can 
be segmented by labeling with the HMMs, page 894, column 1 line 27. Figure 1, step 1 
Waveform and transcription are inputted. The transcription of the waveform would 
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inherently contain the N phonetic units of the waveform. A database would be inherent 
in order to allow for processing by a computer); 

a segmenting unit, for segmenting the recorded speech corpus into N test 
speech unit segments referring to the phonetic information of the N phonetic units in the 
known text script (With an accompanying orthographic transcription, the corpus can be 
segmented by labeling with the HMMs, page 894, column 1 line 27. Figure 1, waveform 
and transcription are both inputted to SI HHM segmentation step, showing that both 
would be considered.); 

a segment-confidence-measure verifying unit, for verifying segment confidence 
measures of N cutting points of the test speech unit segments to determine if the N 
cutting points of the test speech unit segments are correct (To evaluate the effects of 
the whole process, the output after the manual correction is set as the reference. The 
errors are calculated as the difference between the determined boundaries and the 
reference boundaries; page 894, column 2, line 40.); 

a determining unit, for determining acceptance of the phonetic unit by comparing 
a segment reliability of the test speech unit segments to a predetermined threshold 
value; wherein if the combined confidence measure is greater than the predetermined 
threshold value, the phonetic is accepted (To evaluate the effects of the whole process, 
the output after the manual correction is set as the reference. The errors are calculated 
as the difference between the determined boundaries and the reference boundaries. 
The segmentation rate is defined as the percentage of errors within 10ms and 20ms.). 

Chou does not specifically teach: 
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a phonetic-confidence-measure verifying unit, for verifying phonetic confidence 
measures of the test speech unit segments to determine if the test speech unit 
segments correspond to the known text script; and 

nor considering the phonetic confidence measures in the determining the 
acceptance of the phonetic unit. 

In the same field of speech verification, Modi teaches: 

a phonetic-confidence-measure verifying unit, for verifying phonetic confidence 
measures of the test speech unit segments to determine if the test speech unit 
segments correspond to the known text script (Regardless of the conventional 
procedure used to train the recognition HMMs 126 and the verification HMMs 134, in 
operation, the conventional verification subsystem 130 of the automated speech 
recognition system 100 uses the verification procedure shown in FIG. 5 to determine 
whether the recognized utterance is accepted or rejected; column 7 line 54. Figure 5 
shows keywords and anti-keywords with probabilities that are used to come up with a 
likelihood ratio. One of ordinary skill in the art could appreciate that the HMMs used for 
recognition could be limited to the ones used for segmentation, and the same 
verification principles of Modi would apply.); and 

considering the phonetic confidence measures in the determining the acceptance 
of the phonetic unit (the conventional verification subsystem 1 30 of the automated 
speech recognition system 100 uses the verification procedure shown in FIG. 5 to 
determine whether the recognized utterance is accepted or rejected; column 7 line 54.). 
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Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to combine the confidence measure of Modi with the segmentation of 
and verification of Chou in order to allow for assurance that not only is the phonemes 
segmented in the right place, they are also the correct segments. 

8. Consider claim 1 1 , Chou teaches the system as claimed in claim 10, wherein the 
segmenting unit performs the following steps: 

using a hidden Markov model (HMM) to cut the recorded speech corpus into N 
test speech unit segments referring to the phonetic information of the N phonetic units in 
the known text script, wherein each test speech unit segment is defined as 
correspondingly having an initial cutting point (For automatic processing, the boundary 
correction rules are applied instead of the human correction. These prior described 
rules are based on the knowledge from the observations in human correction 
procedures. The outputs of SD HMMs are accepted as the initial boundaries; page 894, 
column 2, line 20.); 

performing a fine adjustment on the initial cutting point of the test speech unit 
segment according to at least one feature factor corresponding to each test speech unit 
segment and calculating at least one cutting point fine adjustment value corresponding 
to each test speech unit segment (For automatic processing, the boundary correction 
rules are applied instead of the human correction. These prior described rules are 
based on the knowledge from the observations in human correction procedures. The 
outputs of SD HMMs are accepted as the initial boundaries. The program then searches 
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in a local area for the acoustic features that match the phonetic properties of the units. 
The features include RMS power, voicing probability and FFT spectrogram derived from 
ESPS programs. The window sizes are varied from 5ms to 20ms according to the 
features and phonetic types of units. For example, a 5ms window of RMS power is 
applied to locate a plosive because there is a short burst of energy when the sound is 
released. If the specified acoustic features are not found in that area, the boundary is 
left no change; page 894, column 2, lines 20-35.); and 

integrating the initial cutting point and the cutting point fine adjustment value of 
the test speech unit segment to obtain a cutting point of the test speech unit segment 
(the adjusted boundaries are further processed to update the parameters of the SD 
HMMs. These procedures are re, cursively performed until the average alternation of 
boundaries is under a threshold; page 894, column 2, line 34.). 

9. Consider claim 12, Chou teaches the system as claimed in claim 1 1 , wherein the 
feature factor of the test speech unit segment is a neighboring cutting point of the initial 
cutting point (The program then searches in a local area for the acoustic features that 
match the phonetic properties of the units; page 894, column 2 line 25.). 

1 0. Consider claim 14, Chou teaches the method as claimed in claim 1 1 , wherein the 
feature factor of the test speech unit segment is an energy value of the test speech unit 
segment (The features include RMS power, voicing probability and FFT spectrogram 
derived from ESPS programs; page 894, column 2, line 27.). 
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1 1 . Consider claim 1 5, Chou teaches the method as claimed in claim 1 4, wherein the 
energy value is an energy value of a band pass signal and a high pass signal retrieved 
from a speaker-dependent band (The features include RMS power, voicing probability 
and FFT spectrogram derived from ESPS programs; page 894, column 2, line 27. The 
FFT spectrogram is made up of bandpass signals, and a group of FFT coefficients can 
be considered together to create a highpass signal. These signals would be in fact 
speaker dependent as speaker dependent HMMs are used.). 

12. Consider claim 16, the method as claimed in claim 1 1 , wherein each cutting point 
fine adjustment value has a weighted value, and the cutting point of the test speech unit 
segment is a weighted average of the initial cutting point and the cutting point fine 
adjustment value (These procedures are re, cursively performed until the average 
alternation of boundaries is under a threshold; page 894, column 2, line 34. This is in 
effect an average of the initial cutting point and the adjusted value.) 

13. Claim 13 is rejected under 35 U.S.C. 103(a) as being unpatentable over Chou in 
view of Modi as applied to claim 2 above, and further in view of Toledano et al (Trying to 
Mimic Human Segmentation of Speech Using HHM and Fuzzy Logic Post-Correction 
Rules). 
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14. Consider claim 13, Chou in view of Modi teaches the system as claimed in claim 
2, but does not specifically teach wherein the feature factor of the test speech unit 
segment is a zero crossing rate (ZCR) of the test speech unit segment. 

In the same field of segmentation verification, Toledano teaches the feature 
factor of the test speech unit segment is a zero crossing rate (ZCR) of the test speech 
unit segment (Signal features at a time position are computed based on two windows of 
fixed width. Among these features is the zero crossing rate; page 4, column 1 , 
paragraph 1.). 

Therefore it would have been obvious to one of ordinary skill in the art at the time 
of the invention to combine the use of zero crossing rate as taught by Toledano with the 
verification method of Chou and Modi in order to provide another tool for assessing the 
accuracy of the segmentation. 

Allowable Subject Matter 

15. Claims 1-9 are allowed. 

16. The following is a statement of reasons for the indication of allowable subject 
matter: 

The Prior Art of Record, specifically Chou et al. and Modi et al. does not 
specifically teach or fairly suggest the limitations "a segment-confidence-measure 
verifying step, for verifying segment confidence measures of all cutting points of the N 
test speech unit segments to determine if the N cutting points of the N test speech unit 
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segments are correct; a phonetic-confidence-measure verifying step, for verifying 
phonetic confidence measures of the test speech unit segments to determine if the test 
speech unit segments correspond to the known text script; and a determining step, for 
determining acceptance of the phonetic unit by comparing a combination of the 
segment confidence measures reliability and the phonetic confidence measures of the 
test speech unit segments to a predetermined threshold value; wherein if the combined 
confidence measure is greater than the predetermined threshold value, the phonetic 
unit is accepted for output." when combine the other limitations of the claim 1 . 



17. Claims 1 7 and 1 8 are objected to as being dependent upon a rejected base 
claim, but would be allowable if rewritten in independent form including all of the 
limitations of the base claim and any intervening claims. 

1 8. Consider claim 1 7, the combination of Chou and Modi does not teach or fairly 
suggest the system as claimed in claim 10, wherein in the segment-confidence- 
measure step, each segment confidence measure of the test speech unit segment is: 

CMS = max(1- h(D)- epsilon g(c(s), f(s)), 0) 

where h(D)= K(Epsilon wi |di -di| ), D is a vector of multiple expert decisions of 
the cutting point, di is the cutting point, d = p(D) is a final decision of the cutting point, 
K(x) is a monotonically increasing function that maps a non-negative variable x into a 
value between 0 and 1 , g(c(s), f(s)) is a cost function value between a cost function 
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ranging from 0 to 1 , s is a segment, c(s) is a type category of the segment s and, f(s) 
are acoustic features of the segment. 

19. Consider claim 18, the combination of Chou and Modi does not teach or fairly 
suggest the system as claimed in claim 10, wherein each phonetic confidence measure 
of the test speech unit segments is determined by: 

CMV = min {LLRI, LLRF.O}, [LLR, = logP(X, {Ho)- logP(Xi I H,) where { |LLRF = 
logP(XF [ Ho)-logP(XF ] HI), 

Xi is initial segment of the test speech unit segment, XF is final segment of the 
test speech unit segment, HO is a null hypothesis of the test speech unit segment 
recorded correctly, HI is an alternative hypothesis of the test speech unit segment 
recorded incorrectly, and LLR is a log likelihood ratio. 

Conclusion 

20. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of 
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the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Douglas C. Godbold whose telephone number is (571) 
270-1451. The examiner can normally be reached on Monday-Thursday 7:00am- 
4:30pm Friday 7:00am-3:30pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Patrick Edouard can be reached on (571) 272-7603. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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